Abstract
Background: The use of electronic medical records (EMRs) and clinical registries has transformed health care delivery by improving data management, care coordination, and research capacity. However, the full potential of these technologies can only be realized through effective interoperability, thereby reducing the burden of manual data entry and enhancing the use of real-world clinical data.
Objective: This review examines technologies that enable automated data extraction and transfer, which promote interoperability between EMRs and clinical registries.
Methods: A search of PubMed, CINAHL, Embase, and Web of Science, including studies published between January 2013 and April 2025, was registered with Open Science Framework a priori and involved three key concepts: (1) “registry,” (2) “electronic medical records,” and (3) “interoperability.” A 2-phase screen identified studies evaluating technologies that facilitate automated data extraction or interoperability. Automation was defined as fully automated, where data are extracted and transferred without human intervention, or semiautomated, where extraction or transfer is predominantly automated but may include manual validation. Only technologies supporting ongoing database integration were eligible for inclusion. Screening, data extraction, and synthesis were conducted by multiple independent reviewers. Technology experts provided extensive input and guidance throughout to ensure the accuracy and relevance of the extracted information.
Results: Overall, 36 studies met the inclusion criteria, representing 12 countries across 5 continents and addressing a wide range of acute and chronic health conditions. Epic was the most frequently reported EMR system, while the most common registry platforms were REDCap (Research Electronic Data Capture; Vanderbilt University), structured query language (SQL) server database, and EMR-embedded solutions. Most approaches centered around extracting data from structured formats (n=18), or a combination of both structured and unstructured formats (n=10), emphasizing the central role of structured EMR data in current automated extraction approaches.
Conclusions: This review advances understanding of interoperability between EMRs and clinical registries by uniquely examining automated and sustainable solutions for data exchange, extending beyond prior work that has largely focused on technologies designed for isolated systems or study-specific data extraction. A novel contribution of this review is the synthesis of context-specific considerations derived from reported implementations, providing a comprehensive overview of how technology selection and implementation are shaped by the context in which they are deployed. While these advancements have reduced reliance on inefficient, error-prone, and resource-intensive manual processes, ongoing challenges in data standardization, seamless integration, and long-term sustainability are compounded by poor and inconsistent reporting across studies. Future efforts should follow comprehensive reporting guidelines, adhere to robust governance principles, and incorporate implementation science frameworks, to not only enable meaningful comparison and synthesis in future research, but also to ensure that technologies can be effectively, feasibly, and sustainably integrated within health care contexts, while upholding the ethical and equitable use of health care data.
doi:10.2196/82380
Keywords
Introduction
In health care, the integration of innovative technology has driven unprecedented progress in communication and information systems, resulting in significant impacts on the efficiency, accuracy, and quality of health care delivery []. Electronic medical records (EMRs) and clinical registries exemplify the sophisticated application of technology to streamline data management, enhance patient care coordination, and support robust clinical research []. Clinical registries in particular are increasingly leveraged for research, quality assurance, and benchmarking, allowing health care providers and researchers the ability to maximize resources and improve patient outcomes in an era of growing demands and limited funding [,]. However, the full potential of these systems can only be realized through effective interoperability, which ensures seamless data exchange and integration across diverse health care platforms [].
Interoperability is broadly defined as “the ability of two or more systems to work together, regardless of different interfaces, platforms, and technologies adopted” in a way that facilitates data-sharing and data-use in improving health care delivery []. In this context, the promise of interoperability between patient EMRs and clinical registries represents enormous potential to minimize the burden of data input, management, and maintenance, while maximizing research productivity in line with real-world experiences [-]. The issue therein not only stems from technical limitations but also encompasses concerns regarding maintaining information fidelity alongside data standardization, ensuring data privacy and security, and navigating regulatory constraints [,]. Overcoming these challenges is pivotal to harnessing the full potential of registries and EMRs, moving toward a more integrated and efficient health care technology ecosystem.
In contemporary health care, EMRs play a crucial role in capturing comprehensive patient data across various clinical encounters []. The data, documented in either a structured format (eg, predefined fields such as diagnosis codes, lab results, and medication lists) or an unstructured format (eg, free-text clinical notes, summaries, and narrative reports) [], facilitates patient care by centralizing all patient information in one location. This centralization keeps multidisciplinary team members informed, supports clinical decision-making, and ensures continuity of care across diverse health care settings [,]. Typically, these same data elements are also leveraged in clinical registries to systematically collect, study, and interpret patient data relevant to specific medical conditions [,], thus filling broad knowledge and evidence gaps that are challenging and costly to capture using traditional research methods [,]. At present, the population of registry data predominantly relies on manual methods, where data are extracted from EMRs and transcribed into the correct format for registry use [-]. Despite the translational benefits of clinical registry data use, this current process is time-consuming, error-prone, resource-intensive, and often constrained by the availability of ongoing funding [-].
Automated data extraction and transfer technologies have emerged as critical tools to improve interoperability, reduce data entry burden, and enhance the timeliness and completeness of health information for research, quality improvement, and policy development [,]. However, while these technological advancements offer significant benefits, there is currently no universally accepted “gold standard” methodology, and challenges surrounding adoption are not yet fully understood. Successful integration of these tools requires not only technical accuracy in extracting and transmitting standardized data, but also robust measures to protect patient privacy, ensure data security, and comply with regulatory requirements [,,]. Adoption is further shaped by organizational, technical, and cultural barriers, underscoring the value of implementation science frameworks to guide effective and sustainable integration [,]. Within this context, this review primarily aims to explore existing automated data extraction and transfer processes from EMRs to clinical registries, with a focus on technologies used, data fidelity, and governance.
The specific review questions are:
- What technologies are being used to facilitate interoperability through automated data extraction and transfer from EMRs to clinical registries or databases?
- To what degree has the implementation of automated data extraction and transfer tools successfully enabled accurate and complete transmission of data in a standardized format?
- What measures have been implemented to safeguard data privacy, ensure security, and comply with regulatory requirements during automated data extraction and transfer?
- What implementation challenges have been reported in the adoption of these technologies, and what implementation frameworks have been used to support their effective integration into clinical data systems?
Methods
Study Design
The PRISMA-ScR (Preferred Reporting Items of Systematic Reviews and Meta-Analyses extension for Scoping Reviews) was used to conduct this review [,]. The review protocol was registered with Open Science Framework prior to study commencement [].
Search Strategy
The search strategy was developed in consultation with a university health librarian to find published studies relevant to the review questions. The search strategy comprised three key concepts: (1) registry, (2) electronic health records (EHR), and (3) interoperability (Table S1 in ). While both terms, EMR and EHR, were included in the search strategy to ensure comprehensiveness, this study uses EMR to collectively refer to both terms, except where the original technology name explicitly uses EHR.
The Systematic Review Accelerator Polyglot Search Translator automation tool [] was used to convert this search strategy into the appropriate formats for 4 databases, including PubMed, CINAHL, Embase, and Web of Science. The reference lists of all included sources were hand-searched. Non-indexed publications were considered if identified through reference list hand-searching, but were not systematically sought beyond the selected databases. Gray literature was neither searched nor eligible, as the review focused exclusively on peer-reviewed studies. The search included all studies published between January 1, 2013, and April 30, 2025, inclusive.
Eligibility Criteria
The core concept of interest was any study exploring technology enabling interoperability between EMRs and research databases, such as clinical registries. Automation was defined as either fully automated, in which data are extracted and transferred without human intervention, or semiautomated, in which extraction or transfer is predominantly automated but may include manual validation steps. This ability to automate the process of extracting and transferring data for research purposes was key for inclusion and, as such, any study using a manual extraction or transfer process was excluded. Publications focusing on technology to extract data for a single study, rather than for integration into ongoing databases or registries, were also excluded. Eligibility within this scoping review was not limited by a specific type of participant, population, or health care context. Similarly, the study design and the specific EMR or registry used did not affect inclusion. Only full-text, peer-reviewed, primary research studies available in English were considered for inclusion, ensuring sufficient detail on the relevant technology was available.
Study Selection
Studies were uploaded into Covidence (Covidence systematic review software) for screening and de-duplication []. Four independent screeners (ED, JB, RM, and KA) completed a title and abstract screen. Conflicts were resolved by discussion between 2 reviewers (ED and JB). Once all conflicts were resolved, the same screeners independently completed the full-text screen, recording the rationale for study exclusion to ensure transparency.
Data Extraction, Synthesis, and Quality Appraisal
Data were entered into Microsoft Excel using a data extraction tool developed a priori []. The data extracted are presented in Table S1 in .
Each paper was extracted by at least 2 reviewers (ED, JB, RM, and KA), with 2 additional technology experts (JS and RM) contributing to the extraction process and providing technological guidance. Quality appraisal of individual sources was not conducted, as inclusion was based solely on relevance to the review aims, regardless of methodological rigor or risk of bias. This approach aligns with PRISMA-ScR guidance and reflects the broader intent of scoping reviews to provide a comprehensive overview of existing literature while mapping the breadth and nature of available evidence [].
Results
Overview
The search yielded 12,815 studies for screening. Overall, 36 studies [,-,-] met the inclusion criteria and were included in the final review. A summary of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow chart is displayed in .

Study Characteristics
A total of 36 heterogenous studies [,-,-] were included in the review (Table S1 in ). Twelve countries from 5 continents were represented, with the majority of studies originating from the United States. Twenty studies [,,,,-] were single-site. The studies addressed a wide range of acute and chronic health conditions, though 3 studies [,,] did not specify the registry’s clinical focus. Patient data were extracted from between 50 and 1.4 million patients per study, with 16 studies [,,,,,-,-,] not clearly reporting the number of unique patients included. The most predominant health care setting type was outpatient care (n=7) [,,,,,,], followed by a combination of settings (n=6) [-,,,], inpatient care (n=4) [,,,], surgical (n=3) [,,], and telehealth (n=1) []; 15 studies [,,,-,,,,,,,] did not clearly describe the setting type. The predominant study design was descriptive technical reports (n=20) [-,,,,,,-,,,-], which detailed the development or integration of automated extraction technologies with EMR and registry systems, and pilot validation studies (n=13) [,,-,,,-,,], which evaluated data extraction systems on a smaller scale to assess accuracy or feasibility.
A variety of EMRs were used, with Epic (Epic Systems Corporation) being the most frequently used system, appearing in 12 studies [,,,,,,,,,,,]. Eight studies [,,,,,-] did not specify the system used, with an additional 4 studies [-] reporting that multiple EMR solutions were used without specifying which systems. Similarly, 12 studies [,,-,,-,,] did not specify the registry or database systems used. The most commonly used database systems included REDCap, structured query language (SQL) server database, and EMR-embedded solutions.
Technological Approaches to Data Extraction and Transfer
Various technologies were used to facilitate the automated extraction and transfer of data from EMRs onto clinical registries (Table S1 in ), with approaches often tailored to align with data formats and institutional requirements. provides a schematic overview of the main types of technological solutions identified in this review, and how they interface with EMRs and clinical registries. A total of 18 studies [,-,,,-,,-,,,] extracted data from structured formats, 10 studies [,,,-,,,,] extracted data from mixed formats (structured and unstructured), 6 studies [,,,,,] extracted data from unstructured formats, and the remaining 2 studies [,] did not report the format of the data source.

Data extraction from structured fields was most commonly performed using SQL querying methods, as reported explicitly in Abu-Rish Blakeney et al [] (SQL database extraction, near real-time transfer), Bodagh et al [] (SQL queries via Cerner Health Information Exchange and National Health Service Data Transfer Services, near real-time), and Kannan et al [] (custom SQL-based extract, transform, load [ETL] pipeline, weekly extraction). Similar SQL-based approaches were inferred to have been used by Kariuki et al [] and Nasir et al [], based on their description of SQL-driven ETL pipelines, although detailed technical implementation was not fully described.
SQL enables automated extraction of clinical data from EMR databases by querying structured fields such as diagnosis codes, laboratory results, and medication lists. These data elements are typically stored in relational tables within the EMR. SQL scripts are written to select specific fields based on predefined criteria (eg, diagnosis and date range) and export them in a standardized format for registry ingestion.
Queries can be scheduled to run at regular intervals (eg, daily or weekly), supporting repeatable, automated data extraction without manual handling. In many cases, the extracted datasets are further processed through ETL pipelines to harmonize field names, apply validation rules, and align data structures with the registry schema, ensuring interoperability. ETL pipelines automate the process of moving data from source systems like EMRs into target systems such as registries by first extracting relevant data, then transforming it to meet predefined standards (eg, renaming fields, standardizing units, and applying validation rules), and finally loading it into the registry database in the correct format.
Built-in EMR functionalities, such as Epic Clarity and embedded data mart solutions, were used for automated routine extraction and transfer by Milinovich and Kattan [] (Epic Clarity, weekly extraction), Nathan et al [] (Epic Clarity, daily extraction), Mou et al [] (Epic Clarity, daily extraction), and Salati et al [] (built-in EMR functionality, near real-time updates).
For unstructured or mixed-format data sources, natural language processing (NLP) methodologies were applied to automate the extraction of clinical information from free-text documentation. NLP automates the extraction of information from unstructured free-text fields in EMRs by converting narrative data into structured formats suitable for registry ingestion.
Two common NLP techniques are named entity recognition (NER) and pattern matching. NER algorithms identify and classify clinical concepts, such as diagnoses, medications, or procedures, often mapping them to standardized terminologies such as ICD-10 (International Statistical Classification of Diseases and Related Health Problems, Tenth Revision) or SNOMED-CT (Systematized Nomenclature of Medicine - Clinical Terms), enabling consistent structuring of extracted data. Pattern matching, by contrast, uses predefined rules or regular expressions to detect specific text patterns (eg, medication names, dosages, and dates) without requiring machine learning models.
In EMR-to-registry integration, these methods allow clinically relevant information to be automatically detected, extracted, and formatted into predefined registry fields, reducing the need for manual chart review and enabling large-scale, systematic use of free-text clinical documentation. Bacchi et al [] explicitly described using Python-based NLP libraries to extract stroke metrics, while Heider et al [] reported use of Apache Unstructured Information Management Architecture for COVID-19 data extraction, with daily updates. Munzone et al [] used a rule-based NLP algorithm, and Mou et al [] used open- and closed-source large language models (LLMs) for oncology data extraction. Tavabi et al [] developed a lightweight NLP pipeline to build orthopedic registries from free-text clinical notes. Wang et al [] deployed a LLM (ChatGLM) to automate real-world data extraction in Chinese hospital settings.
Several studies extracted both structured and unstructured data, using a combination of technologies [,,,-,,,,]. Two studies [,] did not clearly specify the technologies used for data extraction. One study [] described the use of a centralized repository for aggregation of primary care data, but did not detail the specific technological mechanisms for extraction and transfer.
Middleware and standardized interoperability frameworks were also used in many studies. Nakagawa et al [] and Sugiyama et al [] used the Standardized Structured Medical Information eXchange version 2 (SS-MIX2) middleware framework to enable standardized extraction across hospital systems. SS-MIX2 is a standardized middleware framework developed in Japan that enables the extraction, storage, and sharing of structured clinical data from EMRs by converting it into a unified format, supporting interoperability across different health care systems. In terms of interoperability frameworks, Cheng et al [] used REDCap Clinical Data Interoperability Services (CDIS) with daily data transfer via a FHIR (Fast Healthcare Interoperability Resources)-based application programming interface (API). CDIS enables automated extraction of structured clinical data from EMRs by packaging information into standardized FHIR bundles. FHIR organizes health care data into modular, machine-readable resources, while APIs provide the secure connection that transmits these resources to the target registry.
Technologies used to facilitate the transfer of data onto clinical registries varied across studies, with APIs and built-in EMR functionalities being the most commonly reported. Direct API integrations (eg, FHIR APIs) were reported in Stevens et al [], Cheng et al [], and Goel et al [] to support structured and secure registry population, while Pittman et al [] leveraged the Informatics for Integrating Biology & the Bedside (i2b2) middleware platform for scheduled weekly updates via an API. In other studies, such as Pan et al [] and Salati et al [], the specific transfer protocol was not explicitly stated; however, the use of embedded EMR functions suggests that internal secure batch transfers were used.
The frequency of data transfer varied depending on the system architecture, institutional requirements, and operational considerations, ranging from near real-time updates to intervals as long as 6 months. Near real-time updates were achieved in studies such as Abu-Rish Blakeney et al [], Bodagh et al [], and Salati et al [], particularly where SQL or built-in EMR pipelines were leveraged. Daily transfers were reported by Nathan et al [], Mou et al [], and Heider et al []. Weekly or monthly batch processes were described in Kannan et al [] and Nasir et al [], while longer biannual update cycles were used in Garies et al [], reflecting resource constraints. One study customized the frequency of data transfer based on an assigned schedule (eg, daily, weekly, or monthly) for each patient []. Twelve studies [,,,,,,,-] did not specify the frequency of transfers used, and 17 studies [,,,,,,-,,,,,,-] did not explicitly detail the technology used to transfer data onto the clinical registry.
Evaluating Data Quality in Automated EMR-to-Registry Technologies
Data quality was assessed over three key domains: the completeness of data input, the accuracy of data extracted, and the semantic consistency of data transferred (Table S1 in ) []. Twenty-one studies [-,,,,,,-,,,,,,] evaluated the completeness of data inputted into EMR systems. Among these, 13 studies [-,,-,,,,,] extracted exclusively from structured formats, 5 studies [,,,,] derived data from mixed formats, 2 studies [,] derived data from unstructured formats, and 1 study [] did not specify the data format. The most common methods of assessing data completeness post input were manual validation, completion rate assessments, and use of mandatory fields within structured data capture tools. Of these studies, only 8 quantified completeness of data input as a percentage, with completeness ranging from 50% to 100% [,,,,,-]. One study [] presented completeness between structured and unstructured formats, finding 100% completeness for structured data versus less than 20% for unstructured data. The remaining studies did not report completeness of data input.
Thirty studies [-,,,-,,-,-] evaluated the accuracy of technology-driven data extraction. Of these, only 10 studies [,,,,,,,,,] expressed accuracy as a percentage, all demonstrating performance exceeding 90%, irrespective of data format. One study [] further assessed accuracy using sensitivity and specificity metrics, while another compared accuracy of data extraction between open- and closed-source large language models []. Manual validation emerged as the predominant method for assessing extraction accuracy.
Several studies described alternative methods of assessing data quality that did not align with the predefined metrics summarized in Table S1 in . One study applied a qualitative research-grade rating, incorporating both accuracy and completeness, without explicitly reporting percentages []. One study used rejection rates of XML files due to validation errors (eg, missing or inaccurate data) as an indicator of data quality []. Several studies used mandatory fields within structured data capture forms, implying full completeness of data input [,,,]. Three studies [,,] reported data quality at the variable or domain level, such as the availability of key laboratory values or the presence of diagnosis codes, rather than providing comprehensive dataset-level metrics. In one study, data quality was assessed through subjective user perception rather than an objective measurement [].
In addition, a variety of methods were used to maintain semantic consistency post transfer into the clinical registry. Of the 21 studies [-,,,-,-,-,-] that reported a method, the most commonly used approach was the application of standardized terminologies and coding systems. This method was reported in 11 studies [,,,,,,,,-], and most frequently involved ICD-10, Logic Observation Identifiers Names and Codes (LOINC), and SNOMED CT to ensure the consistent interpretation of clinical terms and concepts across systems. Six studies [,,,,,] implemented a shared data dictionary to define data elements and align their definitions across systems. Five studies [,,,,] adhered to data exchange standards, notably Health Level 7, FHIR, CDISC, and JOIA, which define the structure and protocols for encoding and transmitting data between systems to preserve syntactic and semantic integrity. Three studies [,,] applied data models to maintain structural and relational consistency across datasets. One study used standardized nomenclature [], and 5 studies [,,,,] used multiple approaches. The remaining 15 studies [,-,,,,-,,,,] did not specify a method for maintaining semantic consistency post transfer.
Approaches to Data Privacy, Security, and Regulatory Compliance
To uphold privacy standards, patient de-identification prior to registry transfer was conducted in 19 studies [,,,,,,,,,-,-] (Table S1 in ). Among these, 15 studies [,,,,,,,-,,,] did not report the method used. The remaining studies used various approaches, including pattern-matching algorithms [], automated server-based anonymization [], use of an EHR-R-REDCap pipeline [], and manual redaction [].
Reported security measures included the use of secure transmission protocols between EMRs and registries, infrastructure and storage protections, and access controls to safeguard registry data. The majority of studies (n=27) [,-,,,-,-,,] did not report a secure transfer method. Seven studies [,,-] reported transferring the data over HTTPS, secure file transfer protocol, or virtual private network, and 2 studies [,] reported applying security protocols such as secure sockets layer to encrypt and protect data during transfer. Two further studies [,] reported using a secure encrypted channel without specifying the method.
Nine studies [,,,,,,-] adopted secure infrastructure and storage measures to protect registry data. These included Health Insurance Portability Accountability Act (HIPAA)–compliant storage (n=3) [,,], centralized secure housing (n=2) [,], secure cloud infrastructure (n=2) [,], an intranet-based system (n=1) [], and a secure web-based interface (n=1) []. One study stated registry storage was secure, but did not specify the measures taken [].
Methods used to control and authenticate access to registry data included audit logs (n=4) [,,,], role-based access control (n=3) [,,], multifactor authentication (MFA) (n=2) [,], password and USB key security (n=2) [,], token-based access control (n=1) [], single sign-on (n=1) [], and permission set access control (n=1) []. While some studies implemented multiple measures (n=6) [,,,,,], others restricted access without specifying the methods used (n=8) [,,,,,,,].
Various measures were implemented to ensure registry regulatory compliance, including participant consent procedures, ethical oversight, restrictions on data use, and adherence to regulatory standards (Table S1 in ). Consent procedures were reported in 9 studies [,,,-,-], including a waiver of consent (n=5) [,,-], opt-out procedures (n=3) [-], and explicit opt-in consent (n=1) []. Ethics approval for the registry was obtained in 23 studies [,,,,,,,,,,,,,,-,-], with one study receiving an exception []. To restrict registry data use, institutional review board (IRB) oversight (n=19) [,,,,,,,,,,,-,-,], research approval processes (n=8) [,,,-], and data sharing agreements (n=5) [,,,,] were used.
Only a few studies referenced compliance with jurisdiction-specific regulatory standards. Of the studies conducted in the United States, only 4 explicitly reported adherence to the HIPAA [,,,]. One study based in Japan cited compliance with the Act on the Protection of Personal Information (APPI) [], while one European study referenced alignment with the General Data Protection Regulation []. The remaining 30 studies [,-,,,-,-,,-] did not reference adherence to a jurisdiction-specific regulatory standard, though some demonstrated efforts to ensure data privacy and security, suggesting alignment with regulatory principles.
Implementation Challenges and Frameworks for Clinical Data Technologies
Thirty-four studies identified a range of challenges associated with implementation (Table S1 in ) [,-,-,-,-]. Each challenge aligned with at least one of the identified thematic categories: (1) data quality, (2) data mapping, standardization and semantic harmonization, (3) technical, infrastructure and resource availability, (4) workflow integration and adoption, (5) interoperability and data integration across systems, (6) privacy, security and governance, (7) registry scope, generalizability and maintenance, and (8) semantic and temporal issues. The most commonly reported challenges included missing or incomplete data [,,,,,], inaccurate data input or extraction [-,-], significant time and resource demands associated with system development and implementation [,,,,,], poor staff buy-in, uptake, or adoption [,,,,], and the inability to capture events occurring outside the institution [,,,,]. One study broadly highlighted technical, policy, and governance challenges without providing specific details [].
Despite the breadth and volume of challenges identified, no studies explicitly reported the use of an implementation science framework to guide technological development and implementation. Instead, 3 studies referenced the use of formal software development methodologies, specifically the Agile Methodology [,] and a modified Software Development Life Cycle []. An additional 6 studies incorporated elements commonly aligned with implementation science frameworks, including iterative development and co-design, stakeholder involvement and collaborative problem-solving, embedded feedback loops, and rapid-cycle improvement processes [,,,,,].
Mapping Evidence and Contextual Factors for EMR-to-Registry Interoperability
To understand the current landscape of EMR-to-registry integration, evidence on reported technologies, data formats, and implementation challenges was synthesized. Table S1 in presents an evidence gap map summarizing these practices, highlighting the heterogeneity of solutions implemented across studies and the prevalence of implementation challenges, revealing areas that require targeted improvement to support successful integration. Importantly, the map also underscores gaps in reporting technologies used to transfer extracted data from EMRs to clinical registries, a critical component for achieving interoperability. These findings are contextualized in , which provides a consolidated overview of factors influencing technological selection, development, and implementation.

Discussion
Principal Findings
This review advances understanding of interoperability between EMRs and clinical registries by uniquely examining automated and sustainable solutions for data exchange, extending beyond prior work that has largely focused on technologies designed for isolated or study-specific data extraction. Through a synthesis of studies examining existing technological approaches, this review reveals that current approaches are highly variable, context-dependent, and inconsistently evaluated across studies. While existing studies provide important insight into how these processes can be tailored to meet the unique needs and constraints of each health care context, the absence of consistent quality reporting, robust evaluation, and methodological rigor meant no single approach could be recommended. Notably, none of the included studies used a rigorous or comparative study design to evaluate system effectiveness or impact, with most relying on descriptive approaches. This is further compounded by inconsistent and non-standardized reporting on data quality metrics, privacy, security, and regulatory compliance processes, as well as limited application of implementation science frameworks despite reported challenges. Collectively, these limitations restrict meaningful comparisons across studies, and by extension, the technological insights that could inform future implementation.
A novel contribution of this review was the synthesis of the context-specific considerations that influenced technology selection, as informed by reported implementations in the literature and summarized in . This synthesis revealed that a registry’s purpose and scope not only shaped decisions regarding what data was collected (input) and how it was structured, managed, and used (output), but was itself reciprocally influenced by the broader institutional and jurisdictional regulatory environment. Collectively, these interrelated factors not only influenced the feasibility of implementation, but also had profound implications for data quality, degree of automation and interoperability, scalability, and long-term sustainability. These insights have direct implications for future implementations of interoperable technologies, guiding strategic design and governance decisions that promote sustainability, security, and regulatory compliance, while enabling the ethical and efficient reuse of health data for clinical care, quality improvement, and research.
One key factor influencing both the selection and performance of technologies was the data format, ranging from highly structured fields to unstructured free text. Most studies extracted data from structured formats, benefiting from their standardization, compatibility with coding systems, and designated entry fields that enhance data quality and completeness, especially when supported by mandatory fields and branching logic. However, more than 80% of clinical data exists in unstructured text [], which remains underused, highlighting the missed potential for richer data capture and integration. Unlike structured fields, free text enables nuanced narrative expressivity in clinical documentation, making their effective use crucial for comprehensive patient records and research insights []. While some studies used NLP for unstructured data extraction, wider adoption and continued advancement of artificial intelligence (AI)–driven techniques could further enhance the ability to extract meaningful insights from such data, ultimately improving both clinical decision-making and research potential [].
A key consideration in EMR research is the widespread concern regarding the reliability and usability of EMR-derived data for clinical research, as noted in previous literature [-]. The challenge remains in using an imperfect system, originally designed to support clinical care, financial billing, and insurance claims, for clinical research []. Missing data, varying levels of standardization, and a lack of harmonized terminologies can introduce the risk of biases, limit reproducibility, and hinder the integration of EMR-derived data into high-quality research and evidence-based practice. A balancing axiom suggests that data fields actively used, frequently accessed, and regularly reviewed in clinical workflows are more likely to demonstrate improved quality and reliability over time []. However, routine clinical use alone does not guarantee data quality and therefore does not negate the need for standardized and transparent evaluation practices. Addressing these foundational issues is essential to improving the credibility, use, and long-term value of EMR-linked registry systems in high-quality clinical research.
At a minimum, studies should report key data quality metrics such as completeness of data input into EMR systems, accuracy of data extracted using technological methods, and semantic consistency of data transferred onto registry systems. The use of existing frameworks which aim to assess data quality [-] and semantic consistency [,] can help standardize these processes and enhance methodological rigor. Where appropriate, these metrics should also be reported using quantifiable measures (eg, the percentage of missing data) and be accompanied by a clear description of the methods used to calculate them. This ensures that technologies in future research can be evaluated consistently, enabling reliable cross-study comparisons and supporting the development of safe, transparent, and evidence-based recommendations for improving EMR-to-registry interoperability.
Alongside data quality concerns, inconsistent reporting on measures ensuring data privacy, security, and regulatory compliance highlight additional issues surrounding patient confidentiality and the protection of their sensitive information. While these registries enable collaborative data use that enhances both research and clinical outcomes which reciprocally benefit patients, they introduce legal and ethical considerations regarding patient rights and consent, particularly when data is repurposed for research without explicit authorization [-]. Common use of opt-out procedures and ethical waivers of consents, particularly when de-identified data is used, has facilitated large-scale data access for research while minimizing direct patient burden [,]. However, these approaches raise ethical concerns pertaining to transparency, patient autonomy, and privacy risks, especially when the distinction between de-identified and re-identifiable data has not been clearly articulated []. True de-identification is not possible in registries where data pertaining to the same patient is entered at multiple timepoints. In such cases, pseudonymization is commonly used, replacing patient identifiers with coded keys that enable longitudinal linkage of records while preserving patient confidentiality. The secure management of these re-identification keys is therefore a crucial governance consideration, balancing the need for data use in longitudinal research with the ethical imperative to protect patient privacy and autonomy. Notably, no patient perspectives were captured in any of the studies reviewed, revealing a significant gap regarding patient experiences and their views on the reuse of their data in clinical registries []. Similarly, inconsistencies in the use of secure transfer methodologies and poor reporting of jurisdiction-specific regulatory compliance highlight potential risks in integrating and harmonizing data from EMRs to clinical registries. These issues underscore the critical necessity for improved transparency and rigorous adherence to governance frameworks to ensure that data exchange between EMRs and clinical registries remains secure, ethical, and aligned with regulatory standards across diverse health data ecosystems.
Given the sensitivity of health information, technical safeguards must be in place to maintain patient privacy, comply with regulatory requirements, and build trust among stakeholders. Several established and emerging technical solutions are being used to address these concerns. Encryption, both at rest and in transit, is a foundational measure that ensures data cannot be accessed or intercepted without authorization []. Role-based access controls and audit logging further enhance security by restricting access to authorized personnel and maintaining traceability of data interactions []. In some cases, federated data models and privacy-preserving record linkage allow for data analysis across institutions without centralizing identifiable information, thereby minimizing exposure []. The use of secure cloud infrastructure, compliant with standards such as ISO 27001, HIPAA, and General Data Protection Regulation, has also become more prevalent, offering scalable and resilient environments for data storage and processing []. Additionally, data governance frameworks play a critical role in defining policies for data access, sharing, and reuse, ensuring that ethical and legal considerations are upheld []. Emerging technologies such as homomorphic encryption and differential privacy offer promising avenues for enabling secure computation on sensitive data without compromising individual privacy []. While these approaches are still maturing, they represent important innovations in the field of health data protection.
Efforts to improve data quality and regulatory compliance must also minimize clinical burden [,]. The variety of implementation challenges reported across studies highlights the complexity of integrating these tools into clinical and registry workflows, with common barriers including data quality concerns, limited staff engagement, high resource demands for system development, and poor alignment with existing institutional infrastructure and workflows. Despite these challenges, there was a paucity of studies explicitly applying an implementation science framework. This theoretical gap likely contributes to the fragmented nature of reporting and hinders the successful adoption, integration, and long-term sustainability of EMR to registry integration. The use of process-driven models and determinant frameworks is a well-established antecedent to successful technology adoption [-]. A multidisciplinary, co-design, implementation-informed approach is needed to support alignment with clinical workflows and maximize research use while minimizing disruptions to patient care [,].
While automated technologies offer significant advantages over manual data entry, their feasibility is heavily dependent on local financial and technical capacity. Most studies originated from high-income countries, with mature EMR systems and infrastructure. In contrast, low-middle income countries reportedly faced persistent barriers stemming from limited funding and infrastructure capacity [,]. The implementation of EMRs and the subsequent development and maintenance of clinical registries is well recognized as a costly and complex process, requiring substantial investment in system design, infrastructure setup, maintenance, staff training, data security, and interoperability [-]. Beyond the initial setup, registries also incur ongoing costs that are not typically funded by traditional research project grants, adding an additional element of complexity to their implementation and long-term sustainability across organizations. This presents a major barrier for low-middle income countries, where differences in health system models, funding structures, and technological infrastructure further limit the feasibility of widespread EMR integration. Even in high-income countries such as Australia, the roll-out of interoperable registries is further complicated by the fact that most states have adopted different EMR solutions with varying levels of implementation across hospitals, despite the availability of funding. These disparities threaten to widen global gaps in clinical research capacity, data-driven decision-making, and ultimately patient outcomes.
Limitations of This Study
This review has several limitations. Heterogeneity in study designs and inconsistent reporting of technology characteristics constrained the potential for direct comparison and meta-analysis. Only English studies published between January 2013 and April 2025 were included. Excluding gray literature may result in the underrepresentation of relevant real-world technological implementations, particularly outside high-income countries, and could potentially bias findings toward well-resourced settings. Nonindexed sources may also have been missed as they were not systematically searched outside the selected databases. Technologies designed for single study use were also excluded, which may have provided additional insights. Study authors were not contacted for clarification, and all interpretations relied solely on the information presented in the published manuscripts. Furthermore, many included studies are situated within specific national contexts (eg, SS-MIX2 in Japan), which may limit the broader international generalizability of the findings.
Conclusions
This review provides insight into the diverse strategies that facilitate interoperability between EMRs and clinical registries. While technological advancements have negated the need for manual data extraction and transfer onto clinical registries, challenges remain in ensuring access, consistency, security, and sustainability across diverse health care settings. Given the ethical, legal, and regulatory complexities of reusing clinical data for research, future initiatives must operate within robust governance frameworks to safeguard data security, protect patient privacy, and ensure compliance with both institutional and jurisdictional policies while promoting transparency, accountability, and equitable access to research opportunities. Implementation efforts should be tailored to local contexts, guided by implementation science frameworks and informed by meaningful engagement with end users to support adoption. Overcoming these barriers, alongside continued innovation in interoperability solutions, will be essential to maximizing the potential and impact of EMR-linked registries to drive high-quality research, strengthen clinical decision-making, and ultimately improve patient outcomes.
Acknowledgments
We would like to acknowledge Justin Clarke, Senior Research Information Specialist from the Faculty of Health Sciences and Medicine at Bond University, Queensland, Australia, for his valuable contribution in refining the search strategy and applying the Systematic Review Accelerator Polyglot Search Translator automation tool, which was used to convert the final search strategy into appropriate formats for four databases.
Funding
This review was supported by the Walker Pediatric Research Fellowship awarded by the Pediatric Surgical Department within Children’s Health Queensland, Hospital and Health Service. This support was presented in the form of a higher degree by research stipend, with no involvement in the study design, data collection and analysis, decision to publish or preparation of the manuscript. The authors have no conflicts of interest to disclose.
Authors' Contributions
EH conceptualized and designed the study and search strategy; led screening, data collection, synthesis, and interpretation of findings; drafted the initial manuscript, reviewed and revised the manuscript; and approved the final manuscript as submitted. JB contributed to the design of the search strategy, conducted study screening, data collection, analysis, synthesis, and interpretation; contributed to the initial manuscript, critically reviewed and revised the manuscript; and approved the final manuscript as submitted. JS and RM contributed to the technological understanding, synthesis, and interpretation of findings; critically reviewed and revised the manuscript; and approved the final manuscript as submitted. RMF and KAD conducted screening and data extraction, critically reviewed the manuscript; and approved the final manuscript as submitted. AD, KP, CMB, and RK contributed to the conceptualization, synthesis, and interpretation of the study; critically reviewed and revised the manuscript and approved the final manuscript as submitted. BG contributed to the conceptualization and design of the study, supervised screening, data collection, analysis, and interpretation, critically reviewed and revised the manuscript, and approved the final manuscript as submitted.
Conflicts of Interest
None declared.
Multimedia Appendix 4
Summary of electronic medical record and registry infrastructure, technologies, and interoperability approaches.
DOCX File, 71 KBMultimedia Appendix 5
Summary of data quality measures following data input, extraction, and transfer.
DOCX File, 71 KBMultimedia Appendix 8
Summary of implementation challenges identified in establishing electronic medical record-to-registry interoperability.
DOCX File, 109 KBMultimedia Appendix 9
Evidence gap map of technologies, data formats, and implementation challenges in electronic medical record-to-registry integration.
DOCX File, 88 KBReferences
- Yeung AWK, Torkamani A, Butte AJ, et al. The promise of digital healthcare technologies. Front Public Health. 2023;11:1196596. [CrossRef] [Medline]
- Alzu’bi AA, Watzlaf VJM, Sheridan P. Electronic health record (EHR) abstraction. Perspect Health Inf Manag. 2021;18(Spring):1g. [Medline]
- Klaiman T, Pracilio V, Kimberly L, Cecil K, Legnini M. Leveraging effective clinical registries to advance medical care quality and transparency. Popul Health Manag. Apr 2014;17(2):127-133. [CrossRef] [Medline]
- Hoque DME, Kumari V, Hoque M, Ruseckaite R, Romero L, Evans SM. Impact of clinical registries on quality of patient care and clinical outcomes: a systematic review. PLoS ONE. 2017;12(9):e0183667. [CrossRef] [Medline]
- Olaronke I, Soriyan H, Gambo I, Olaleke J. Interoperability in healthcare: benefits, challenges and resolutions. Int J Innov Appl Stud. Apr 2013;3:2028-9324. URL: https://ijias.issr-journals.org/abstract.php?article=IJIAS-13-090-01 [Accessed 2026-05-20]
- de Mello BH, Rigo SJ, da Costa CA, et al. Semantic interoperability in health records standards: a systematic literature review. Health Technol (Berl). 2022;12(2):255-272. [CrossRef] [Medline]
- Manion FJ, Harris MR, Buyuktur AG, Clark PM, An LC, Hanauer DA. Leveraging EHR data for outcomes and comparative effectiveness research in oncology. Curr Oncol Rep. Dec 2012;14(6):494-501. [CrossRef] [Medline]
- Nordo AH, Levaux HP, Becnel LB, et al. Use of EHRs data for clinical research: historical progress and current applications. Learn Health Syst. Jan 2019;3(1):e10076. [CrossRef] [Medline]
- Pop B, Fetica B, Blaga ML, et al. The role of medical registries, potential applications and limitations. Med Pharm Rep. Jan 2019;92(1):7-14. [CrossRef] [Medline]
- Filkins BL, Kim JY, Roberts B, et al. Privacy and security in the era of digital health: what should translational researchers know and do about it? Am J Transl Res. 2016;8(3):1560-1580. [Medline]
- Cowie MR, Blomster JI, Curtis LH, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. Jan 2017;106(1):1-9. [CrossRef] [Medline]
- Negro-Calduch E, Azzopardi-Muscat N, Krishnamurthy RS, Novillo-Ortiz D. Technological progress in electronic health record system optimization: systematic review of systematic literature reviews. Int J Med Inform. Aug 2021;152:104507. [CrossRef] [Medline]
- Honavar SG. Electronic medical records - the good, the bad and the ugly. Indian J Ophthalmol. Mar 2020;68(3):417-418. [CrossRef] [Medline]
- Bodagh N, Archbold RA, Weerackody R, et al. Feasibility of real-time capture of routine clinical data in the electronic health record: a hospital-based, observational service-evaluation study. BMJ Open. Mar 8, 2018;8(3):e019790. [CrossRef] [Medline]
- D’Agnolo HM, Kievit W, Andrade RJ, Karlsen TH, Wedemeyer H, Drenth JP. Creating an effective clinical registry for rare diseases. United European Gastroenterol J. Jun 2016;4(3):333-338. [CrossRef] [Medline]
- Solomon DJ, Henry RC, Hogan JG, Van Amburg GH, Taylor J. Evaluation and implementation of public health registries. Public Health Rep. 1991;106(2):142-150. [Medline]
- Dreyer NA, Garner S. Registries for robust evidence. JAMA. Aug 19, 2009;302(7):790-791. [CrossRef] [Medline]
- Chen AM, Kupelian PA, Wang PC, Steinberg ML. Development of a radiation oncology-specific prospective data registry for research and quality improvement: a clinical workflow-based solution. JCO Clin Cancer Inform. Dec 2018;2:1-9. [CrossRef] [Medline]
- Kariuki JM, Manders EJ, Richards J, et al. Automating indicator data reporting from health facility EMR to a national aggregate data system in Kenya: an interoperability field-test using OpenMRS and DHIS2. Online J Public Health Inform. 2016;8(2):e188. [CrossRef] [Medline]
- Pittman CA, Miranpuri AS. Neurosurgery clinical registry data collection utilizing informatics for integrating biology and the bedside and electronic health records at the University of Rochester. Neurosurg Focus. Dec 2015;39(6):26621414. [CrossRef] [Medline]
- Kannan V, Fish JS, Mutz JM, et al. Rapid development of specialty population registries and quality measures from electronic health record data*. An agile framework. Methods Inf Med. Jun 14, 2017;56(99):e74-e83. [CrossRef] [Medline]
- Kapoor R, Sleeman WC 4th, Nalluri JJ, et al. Automated data abstraction for quality surveillance and outcome assessment in radiation oncology. J Appl Clin Med Phys. Jul 2021;22(7):177-187. [CrossRef] [Medline]
- Shalhout SZ, Saqlain F, Wright K, Akinyemi O, Miller DM. Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry. JAMIA Open. Apr 2022;5(1):ooab118. [CrossRef] [Medline]
- Williams A, Goedicke W, Tissera KA, Mankarious LA. Leveraging existing tools in electronic health record systems to automate clinical registry compilation. Otolaryngol Head Neck Surg. Mar 2020;162(3):408-409. [CrossRef] [Medline]
- Lee M, Kim K, Shin Y, Lee Y, Kim TJ. Advancements in electronic medical records for clinical trials: enhancing data management and research efficiency. Cancers (Basel). May 2, 2025;17(9):1552. [CrossRef] [Medline]
- Mueller C, Herrmann P, Cichos S, et al. Automated electronic health record to electronic data capture transfer in clinical studies in the German health care system: feasibility study and gap analysis. J Med Internet Res. Aug 4, 2023;25:e47958. [CrossRef] [Medline]
- Holmes JH, Beinlich J, Boland MR, et al. Why Is the electronic health record so challenging for research and clinical care? Methods Inf Med. May 2021;60(1-02):32-48. [CrossRef] [Medline]
- van Velthoven MH, Mastellos N, Majeed A, O’Donoghue J, Car J. Feasibility of extracting data from electronic medical records for research: an international comparative study. BMC Med Inform Decis Mak. Jul 13, 2016;16(1):90. [CrossRef] [Medline]
- Trinkley KE, Maw AM, Torres CH, Huebschmann AG, Glasgow RE. Applying implementation science to advance electronic health record-driven learning health systems: case studies, challenges, and recommendations. J Med Internet Res. Oct 7, 2024;26:e55472. [CrossRef] [Medline]
- Bauer MS, Kirchner J. Implementation science: what is it and why should I care? Psychiatry Res. Jan 2020;283:112376. [CrossRef] [Medline]
- Sucharew H, Macaluso M. Progress notes: methods for research evidence synthesis: the scoping review approach. J Hosp Med. Jul 1, 2019;14(7):416-418. [CrossRef] [Medline]
- Tricco AC, Lillie E, Zarin W, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. Oct 2, 2018;169(7):467-473. [CrossRef] [Medline]
- Dulay E. Current capacity of interoperability between clinical registries and patient electronic medical records: a scoping review. Open Science Framework (OSF). Preprint posted online on Oct 23, 2023. URL: https://doi.org/10.17605/OSF.IO/83W76 [Accessed 2026-05-20]
- Clark JM, Sanders S, Carter M, et al. Improving the translation of search strategies using the Polyglot Search Translator: a randomized controlled trial. J Med Libr Assoc. Apr 2020;108(2):195-207. [CrossRef] [Medline]
- Veritas Health Innovation. Covidence - better systematic review management. Covidence. URL: https://www.covidence.org/ [Accessed 2026-05-20]
- Microsoft excel. Microsoft 365. URL: https://office.microsoft.com/excel [Accessed 2026-05-20]
- Abu-Rish Blakeney E, Wolpin S, Lavallee DC, Dardas T, Cheng R, Zierler B. Developing and implementing a heart failure data mart for research and quality improvement. Inform Health Soc Care. 2019;44(2):164-175. [CrossRef] [Medline]
- Bacchi S, Gluck S, Koblar S, Jannes J, Kleinig T. Automated information extraction from free-text medical documents for stroke key performance indicators: a pilot study. Intern Med J. Feb 2022;52(2):315-317. [CrossRef] [Medline]
- Heider PM, Pipaliya RM, Meystre SM. A natural language processing tool offering data extraction for COVID-19 related information (DECOVRI). Stud Health Technol Inform. Jun 6, 2022;290:1062-1063. [CrossRef] [Medline]
- Li N, Zhu Q, Dang Y, et al. Development and implementation of a dynamically updated big data intelligence platform using electronic medical records for secondary hypertension. Rev Cardiovasc Med. Mar 2024;25(3):104. [CrossRef] [Medline]
- Milinovich A, Kattan MW. Extracting and utilizing electronic health data from Epic for research. Ann Transl Med. Feb 2018;6(3):42. [CrossRef] [Medline]
- Mou Z, Sitapati AM, Ramachandran M, Doucet JJ, Liepert AE. Development and implementation of an automated electronic health record-linked registry for emergency general surgery. J Trauma Acute Care Surg. Aug 1, 2022;93(2):273-279. [CrossRef] [Medline]
- Mou Y, Lehmkuhl J, Sauerbrunn N, et al. Improving the quality of unstructured cancer data using large language models: a German oncological case study. Stud Health Technol Inform. Aug 22, 2024;316:685-689. [CrossRef] [Medline]
- Munzone E, Marra A, Comotto F, et al. Development and validation of a natural language processing algorithm for extracting clinical and pathological features of breast cancer from pathology reports. JCO Clin Cancer Inform. Aug 2024;8:e2400034. [CrossRef] [Medline]
- Nathan JK, Foley J, Hoang T, et al. The stroke navigator: meaningful use of the electronic health record to efficiently report inpatient stroke care quality. J Am Med Inform Assoc. Nov 1, 2018;25(11):1534-1539. [CrossRef] [Medline]
- Pan HY, Shaitelman SF, Perkins GH, Schlembach PJ, Woodward WA, Smith BD. Implementing a real-time electronic data capture system to improve clinical documentation in radiation oncology. J Am Coll Radiol. Apr 2016;13(4):401-407. [CrossRef] [Medline]
- Rayman S, Benvenisti H, Westrich G, Schtrechman G, Nissan A, Segev L. Colorectal surgery surveillance: a novel method for composing an automated real-time prospective registry. Isr Med Assoc J. Apr 2021;23(4):239-244. [Medline]
- Rubio-Mayo P, Ojeda-Thies C, Jiménez-Cerezo MJ, Garcia-Barrio N, Cruz-Bermúdez JL, Pedrera-Jiménez M. HCE2RNFC: an efficient methodology for reusing the EHR in the Spanish National Hip Fracture Registry. Stud Health Technol Inform. Aug 22, 2024;316:1422-1426. [CrossRef] [Medline]
- Salati M, Pompili C, Refai M, Xiumè F, Sabbatini A, Brunelli A. Real-time database drawn from an electronic health record for a thoracic surgery unit: high-quality clinical data saving time and human resources. Eur J Cardiothorac Surg. Jun 2014;45(6):1017-1019. [CrossRef] [Medline]
- Wang B, Lai J, Cao H, et al. Enhancing the interoperability and transparency of real-world data extraction in clinical research: evaluating the feasibility and impact of a ChatGLM implementation in Chinese hospital settings. Eur Heart J Digit Health. Nov 2024;5(6):712-724. [CrossRef] [Medline]
- Wulff A, Mast M, Hassler M, Montag S, Marschollek M, Jack T. Designing an openEHR-Based pipeline for extracting and standardizing unstructured clinical data using natural language processing. Methods Inf Med. Dec 2020;59(S 02):e64-e78. [CrossRef] [Medline]
- Stevens A, Karki S, Shivers E, et al. SmartChart Suite: a Fast Healthcare Interoperability Resources-based framework for longitudinal syphilis surveillance using structured and unstructured data. JAMIA Open. Feb 2025;8(1):ooae145. [CrossRef] [Medline]
- Cheng AC, Duda SN, Taylor R, et al. REDCap on FHIR: clinical data interoperability services. J Biomed Inform. Sep 2021;121:103871. [CrossRef] [Medline]
- González L, Pérez-Rey D, Alonso E, et al. Building an I2B2-based population repository for clinical research. Stud Health Technol Inform. Jun 16, 2020;270:78-82. [CrossRef] [Medline]
- Nasir K, Gullapelli R, Nicolas JC, et al. Houston Methodist cardiovascular learning health system (CVD-LHS) registry: methods for development and implementation of an automated electronic medical record-based registry using an informatics framework approach. Am J Prev Cardiol. Jun 2024;18:100678. [CrossRef] [Medline]
- Valencia Morales DJ, Bansal V, Heavner SF, et al. Validation of automated data abstraction for SCCM discovery VIRUS COVID-19 registry: practical EHR export pathways (VIRUS-PEEP). Front Med (Lausanne). 2023;10:1089087. [CrossRef] [Medline]
- Dong Y, Fang K, Wang X, et al. The network of Shanghai Stroke Service System (4S): a public health-care web-based database using automatic extraction of electronic medical records. Int J Stroke. Jul 2018;13(5):539-544. [CrossRef] [Medline]
- Goel AK, Campbell WS, Moldwin R. Structured data capture for oncology. JCO Clin Cancer Inform. Feb 2021;5:194-201. [CrossRef] [Medline]
- Tavabi N, Pruneski J, Golchin S, et al. Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline. Artif Intell Med. May 2024;151:102847. [CrossRef] [Medline]
- Sugiyama T, Miyo K, Tsujimoto T, et al. Design of and rationale for the Japan Diabetes compREhensive database project based on an Advanced electronic Medical record System (J-DREAMS). Diabetol Int. Nov 2017;8(4):375-382. [CrossRef] [Medline]
- Nakagawa N, Sofue T, Kanda E, et al. J-CKD-DB: a nationwide multicentre electronic health record-based chronic kidney disease database in Japan. Sci Rep. Apr 30, 2020;10(1):7351. [CrossRef] [Medline]
- Garies S, Cummings M, Forst B, et al. Achieving quality primary care data: a description of the Canadian Primary Care Sentinel Surveillance Network data capture, extraction, and processing in Alberta. Int J Popul Data Sci. Jul 29, 2019;4(2):1132. [CrossRef] [Medline]
- Dalhatu I, Aniekwe C, Bashorun A, et al. From paper files to web-based application for data-driven monitoring of HIV programs: Nigeria’s journey to a national data repository for decision-making and patient care. Methods Inf Med. Sep 2023;62(3-04):130-139. [CrossRef] [Medline]
- Miyake M, Akiyama M, Kashiwagi K, Sakamoto T, Oshika T. Japan Ocular Imaging Registry: a national ophthalmology real-world database. Jpn J Ophthalmol. Nov 2022;66(6):499-503. [CrossRef] [Medline]
- AbuHalimeh A. Improving data quality in clinical research informatics tools. Front Big Data. 2022;5:871897. [CrossRef] [Medline]
- Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW, Johnson KB. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc. 2011;18(2):181-186. [CrossRef] [Medline]
- Rubinger L, Gazendam A, Ekhtiari S, Bhandari M. Machine learning and artificial intelligence in research and healthcare. Injury. May 2023;54 Suppl 3:S69-S73. [CrossRef] [Medline]
- Syed R, Eden R, Makasi T, et al. Digital health data quality issues: systematic review. J Med Internet Res. Mar 31, 2023;25:e42615. [CrossRef] [Medline]
- Garza M, Myneni S, Nordo A, et al. eSource for standardized health information exchange in clinical research: a systematic review. Stud Health Technol Inform. 2019;257(115-24):115-124. [Medline]
- Gianfrancesco MA, Goldstein ND. A narrative review on the validity of electronic health record-based research in epidemiology. BMC Med Res Methodol. Oct 27, 2021;21(1):234. [CrossRef] [Medline]
- Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. Jan 1, 2013;20(1):144-151. [CrossRef] [Medline]
- Reimer AP, Milinovich A, Madigan EA. Data quality assessment framework to assess electronic medical record data for use in research. Int J Med Inform. Jun 2016;90:40-47. [CrossRef] [Medline]
- Lewis AE, Weiskopf N, Abrams ZB, et al. Electronic health record data quality assessment and tools: a systematic review. J Am Med Inform Assoc. Sep 25, 2023;30(10):1730-1740. [CrossRef] [Medline]
- Facile R, Chronaki C, van Reusel P, Kush R. Standards in sync: five principles to achieve semantic interoperability for TRUE research for healthcare. Front Digit Health. 2025;7:1567624. [CrossRef] [Medline]
- Kotwal S, Webster AC, Cass A, Gallagher M. A review of linked health data in Australian nephrology. Nephrology (Carlton). Jun 2016;21(6):457-466. [CrossRef] [Medline]
- Jabour AM. Putting patients at the center of health information exchange design: An exploration of patient preferences for data sharing. Health Informatics J. 2024;30(3):14604582241277029. [CrossRef] [Medline]
- Tertulino R, Antunes N, Morais H. Privacy in electronic health records: a systematic mapping study. J Public Health (Berl). Mar 2024;32(3):435-454. [CrossRef]
- de Man Y, Wieland-Jorna Y, Torensma B, et al. Opt-in and opt-out consent procedures for the reuse of routinely recorded health data in scientific research and their consequences for consent rate and consent bias: systematic review. J Med Internet Res. Feb 28, 2023;25:e42131. [CrossRef] [Medline]
- Md Emdadul Hoque D, Ruseckaite R, Lorgelly P, McNeil JJ, Evans SM. Cross-sectional study of characteristics of clinical registries in Australia: a resource for clinicians and policy makers. Int J Qual Health Care. Apr 1, 2018;30(3):192-199. [CrossRef] [Medline]
- Fesl S, Lang C, Schmitt J, et al. Factors influencing patients’ willingness to share their digital health data for primary and secondary use: a theory- and evidence-based overview of reviews. Digit Health. 2025;11:20552076251340254. [CrossRef] [Medline]
- Mehrtak M, SeyedAlinaghi S, MohsseniPour M, et al. Security challenges and solutions using healthcare cloud computing. J Med Life. 2021;14(4):448-461. [CrossRef] [Medline]
- de Carvalho Junior MA, Bandiera-Paiva P. Health information system role-based access control current security trends and challenges. J Healthc Eng. 2018;2018:6510249. [CrossRef] [Medline]
- Baumgartner M, Kreiner K, Lauschensky A, et al. Health data space nodes for privacy-preserving linkage of medical data to support collaborative secondary analyses. Front Med (Lausanne). 2024;11:1301660. [CrossRef] [Medline]
- Abraham R, Schneider J, vom Brocke J. Data governance: a conceptual framework, structured review, and research agenda. Int J Inf Manage. Dec 2019;49:424-438. [CrossRef]
- Shin H, Ryu K, Kim JY, Lee S. Application of privacy protection technology to healthcare big data. Digit Health. 2024;10:20552076241282242. [CrossRef] [Medline]
- Alumran A, Aljuraifani SA, Almousa ZA, et al. The influence of electronic health record use on healthcare providers burnout. Informatics in Medicine Unlocked. 2024;50:101588. [CrossRef]
- Murad MH, Vaa Stelling BE, West CP, et al. Measuring documentation burden in healthcare. J Gen Intern Med. Nov 2024;39(14):2837-2848. [CrossRef] [Medline]
- Delaforce A, Li J, Niven P, et al. Using the CFIR-ERIC approach to enhance the uptake of a digital fall prevention platform: a quasi-experimental pre-post implementation study. 2025. [CrossRef]
- Rouleau G, Wu K, Ramamoorthi K, et al. Mapping theories, models, and frameworks to evaluate digital health interventions: scoping review. J Med Internet Res. Feb 5, 2024;26:e51098. [CrossRef] [Medline]
- Greenhalgh T, Wherton J, Papoutsi C, et al. Beyond adoption: a new framework for theorizing and evaluating nonadoption, abandonment, and challenges to the scale-up, spread, and sustainability of health and care technologies. J Med Internet Res. Nov 1, 2017;19(11):e367. [CrossRef] [Medline]
- Li J, Maddock E, Hosking M, et al. Identifying and optimizing factors influencing the implementation of a fast healthcare interoperability resources accelerator: qualitative study using the consolidated framework for implementation research-expert recommendations for implementing change approach. JMIR Med Inform. May 27, 2025;13:e66421. [CrossRef] [Medline]
- Griffin BR, Trenoweth K, Dimanopoulos TA, et al. Co-design of a paediatric post-trauma electronic psychosocial screen. J Pediatr Nurs. 2024;76:52-60. [CrossRef] [Medline]
- Dimanopoulos MTA, Trenoweth MK, De Young AC, Kimble R, Griffin BR. The acceptability, feasibility and adoption of a co-designed electronic post-injury psychosocial screening tool for carers of children admitted to hospital following injury. J Pediatr Nurs. 2025;81:155-164. [CrossRef] [Medline]
- Modi S, Feldman SS. The value of electronic health records since the health information technology for economic and clinical health act: systematic review. JMIR Med Inform. Sep 27, 2022;10(9):e37283. [CrossRef] [Medline]
- Aguirre RR, Suarez O, Fuentes M, Sanchez-Gonzalez MA. Electronic health record implementation: a review of resources and tools. Cureus. Sep 13, 2019;11(9):e5649. [CrossRef] [Medline]
- Al Ani M, Garas G, Hollingshead J, Cheetham D, Athanasiou T, Patel V. Which electronic health record system should we use? A systematic review. Med Princ Pract. 2022;31(4):342-351. [CrossRef] [Medline]
Abbreviations
| AI : artificial intelligence |
| API : application programming interface |
| APPI: Act on the Protection of Personal Information |
| CDIS: Clinical Data Interoperability Services |
| CDISC: Clinical Data Interchange Standards Consortium |
| CDISC: Clinical Data Interchange Standards Consortium |
| EHR: electronic health record |
| EMR: electronic medical record |
| ETL: extract, transform, load |
| FHIR: Fast Healthcare Interoperability Resources |
| HIPAA: Health Insurance Portability and Accountability Act |
| i2b2: Informatics for Integrating Biology & the Bedside |
| ICD-10: International Statistical Classification of Diseases and Related Health Problems, Tenth Revision |
| IRB: institutional review board |
| LLM: large language model |
| LOINC: Logic Observation Identifiers Names and Codes |
| MFA: multifactor authentication |
| NER: named entity recognition |
| NLP: natural language processing |
| PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
| PRISMA-ScR: Preferred Reporting Items of Systematic Reviews and Meta-Analyses extension for Scoping Reviews |
| REDCap: Research Electronic Data Capture |
| SNOMED CT: Systematized Nomenclature of Medicine – Clinical Terms |
| SQL: structured query language |
| SS-MIX2: Standardized Structured Medical Information exchange version 2 |
Edited by Stefano Brini; submitted 14.Aug.2025; peer-reviewed by Abhishek Shivanna, Gabriela Costa; final revised version received 14.Jan.2026; accepted 15.Jan.2026; published 25.May.2026.
Copyright© Erika Haynes, James Brannigan, Jessica Suna, Reid Malseed, Alana Delaforce, Rachel Mulvenney-Fenner, Katherine Alog-Daroya, Karin Plummer, Craig McBride, Roy Kimble, Bronwyn Griffin. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.May.2026.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

